Attention Is All You Need
https://arxiv.org/abs/1706.03762
Transformer
解説
http://deeplearning.hatenablog.com/entry/transformer
Position-wise fully connected feed-forward network
(トークンの)位置ごとに独立したFFN
$ \operatorname{FFN}(x)=\max \left(0, x W_{1}+b_{1}\right) W_{2}+b_{2}